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DETAILED ACTION 

1. This Office action for US Patent Application 10/737,184 is in response to the 
Request for Continued Examination filed 07 August 2008, in reply to the Final Rejection 
of 02 May 2008. Currently, claims 2-19, 21-23, 25, 27-29, and 31-32 are pending. 
Claims 20, 24, 26, and 30 are newly canceled. 

2. In the previous Office action, claims 2-10 and 13-31 were rejected under 35 
U.S.C. 103(a) as obvious over US 5,802,226 A (Dischert et al.) in view of US 6,526,099 
B1 (Christopolous et al.). Claims 11, 12, and 32 were rejected under 35 U.S.C. 103(a) 
as obvious over Dischert et al. in view of Christopolous et al. and US 5,477,276 A 
(Oguro). 

Continued Examination Under 37 CFR 1.114 

3. A request for continued examination under 37 CFR 1.114, including the fee set 
forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this 
application is eligible for continued examination under 37 CFR 1.114, and the fee set 
forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action 
has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 07 
August 2008 has been entered. 
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Information Disclosure Statement 

The information disclosure statement filed 14 October 2008 fails to comply with the 
provisions of 37 CFR 1.98(b)(5) and MPEP § 609 because the documents cited therein 
are not identified by publication date including at least month and year. It has been 
placed in the application file, but the information referred to therein has not been 
considered as to the merits. Applicant is advised that the date of any re-submission of 
any item of information contained in this information disclosure statement or the 
submission of any missing element(s) will be the date of submission for purposes of 
determining compliance with the requirements based on the time of filing the statement, 
including all certification requirements for statements under 37 CFR 1.97(e). See 
MPEP § 609.05(a). 

Response to Arguments 

4. Applicant's arguments filed with respect to claims 2, 13, and 18 have been fully 
considered but they are not persuasive. 

5. , Applicant first argues that Dischert et al. does not disclose transform coefficients 
"representative of residual data" (pp. 8-9). It is respectfully submitted that while 
Dischert et al. does not explicitly mention residual data, it was commonly known in the 
art at the time of the present invention that in a DCT-based digital video codec such as 
the one described in Dischert et al., video frames may be classified into I frames, P 
frames, or B frames, of which I frames comprise independently-coded data, and P 
frames and B frames comprise motion vector data and residual data from motion 
compensating the I frames over time. The encoder of Christopolous et al. demonstrates 
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this process, such as with motion compensator 122 in figure 1, motion compensator 307 
in figure 3a., &c. Christopolous et al. states, in column 1: lines 26-49 and 61-67; and 
column 2: lines 1-14, that performing motion compensation on digital video data such as 
that found in Dischert et al. would greatly improve the compression ratio, achieving 
video of reasonable quality over a relatively narrow channel. Even if the encoder of 
Dischert et al. does not teach motion compensation, it is respectfully submitted that it 
would have been obvious to one having ordinary skill in the art at the time the invention 
was made to modify Dischert et al. to operate on motion-compensated video, or to 
modify Dischert et al. to perform motion compensation, with the predictable improved 
result of a more efficiently compressed video that takes advantage of temporal 
redundancy between frames, and thus is able to be transmitted over a more narrow 
channel than a series of independently coded digital video frames alone since it has 
been held that to apply a known technique to a known device, method, or product 
ready for improvement to yield predictable results involves only routine skill in the art. 
MPEP 2143(D), Dann v. Johnston, 425 U.S. 219, 189 USPQ 257 (1976), In re Nilssen, 
851 F.2d 1401, 7 USPQ2d 1400 (Fed. Cir. 1988). 

6. Applicant next states that it would be improper to make the above modification to 
Dischert et al., since such a modification would be incompatible with the shufflers of 
Dischert et al., and thus either render the modified Dischert et al. device inoperable, or 
require removal of the shufflers, which would constitute an improper substantial 
reconstruction and redesign of the Dischert device (pp. 9-11), and because of this, there 
would be no rationale to modify the references (pp. 11-13). Applicant further states that 
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this would be the case regardless of whether the shuffling of Dischert et al. was inter- 
frame or intra-frame. 

First, it is respectfully submitted that even if the claimed "shuffling" was 
performed on an inter-frame basis, it would still be proper to perform the combination 
with Christopolous et al. Christopolous et al. is directed to several video standards, 
such as H.261 or H.263 (column 3: lines 15-17). It was known in the art at the time of 
the present invention to transmit motion-compensated pictures in a different order than 
playback, so that a picture that is dependent on a future temporal picture as a reference 
may receive that picture prior to playback. See H.263 §§0.1-0.2. This re-ordering 
would be encompassed by the claimed "shuffling" of Christopolous et al. 

Second, it is respectfully submitted that even if the claimed "shuffling" was a re- 
arrangement of portions of sub-frame data on physical tracks of a video cassette tape, 
as argued, this would not prevent the shuffling from being compatible with residual data. 
Applicant states that in data shuffling, "the portion of the video data scanned in one 
frame may not be the same portion that is scanned in the other frame". However, it 
does not appear that the shuffling in either Dischert or the Kim et al. reference 
presented as evidence does not output scrambled video data in which one portion of 
data moves about a display as it is repeatedly scanned and played, even if the data 
may be found in a different portion of a track in different instances as taught in Kim et al. 
Applicant states in pages 10 and 11 that residual data must be based on reference data 
in a physical scanned location that remains the same for each instance. For this 
assumption to be valid, it must be inferred that the physical location of each repeated 
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scan of a portion of data must remain the same to enable slow-speed playback as well. 
Although Kim et al. is focused mainly on recording and does not disclose details of a 
playback mode, it appears that during playback, the shuffled data is reconstructed back 
to an original order, presumably based on identification signal Id (column 5: line 18) and 
within buffer 30. It is respectfully submitted that according to a conventional shuffling 
operation, physical location of data on a recording medium such as a does not 
necessarily correspond with location of the location of the decoded data on a display. 
Therefore, extracting residual data from video data that is shuffled on a recording 
medium is possible. 

Considering the above, all prior art rejections are maintained. 

Claim Objections 

7. Claim 21 objected to under 37 CFR 1.75(c), as being of improper dependent 
form for failing to further limit the subject matter of a previous claim. Applicant is 
required to cancel the claim(s), or amend the claim(s) to place the claim(s) in proper 
dependent form, or rewrite the claim(s) in independent form. Currently, claim 21 
depends on canceled claim 20. 

Claim Rejections - 35 USC § 101 

8. 35 U.S.C. 101 reads as follows: 

Whoever invents or discovers any new and useful process, machine, manufacture, or composition of 
matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the 
conditions and requirements of this title. 
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9. Claims 2-12 are rejected under 35 U.S.C. 101 because the claimed invention is 
directed to non-statutory subject matter. While the claims recite a series of steps or 
acts to be performed, a statutory "process" under 35 U.S.C. 101 must (1) be tied to 
another statutory category (such as a particular apparatus), or (2) transform underlying 
subject matter (such as an article or material) to a different state or thing. Ex Parte 
Langemyr, BPAI 2008-1495 (28 May 2008). The present claims neither transform 
underlying subject matter nor positively tie to another statutory category that 
accomplishes the claimed method steps, and therefore do not qualify as a statutory 
process. 

Claim Rejections - 35 USC §112 

10. The following is a quotation of the first paragraph of 35 U.S.C. 112: 

The specification shall contain a written description of the invention, and of the manner and process of 
making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the 
art to which it pertains, or with which it is most nearly connected, to make and use the same and shall 
set forth the best mode contemplated by the inventor of carrying out his invention. 

11. Claims 27-32 are rejected under 35 U.S.C. 112, first paragraph, as failing to 
comply with the written description requirement. The claim(s) contains subject matter 
which was not described in the specification in such a way as to reasonably convey to 
one skilled in the relevant art that the inventor(s), at the time the application was filed, 
had possession of the claimed invention. Claims 27-32 are directed to a "computer 
readable storage medium", first claimed as such in the amendment of 09 October 2007. 
There is no support in the specification for the claimed "computer-readable storage 
medium", with the specification instead only briefly mentioning in page 1 a "PC 
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platform". Accordingly, the "computer-readable storage medium" constitutes new 
matter. 

Claim Rejections - 35 USC § 103 

12. The following is a quotation .of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

13. Claims 2-10, 13-19,- 21-23, 25, 27-29, and 31 are rejected under 35 U.S.C. 
103(a) as being unpatentable over US 5,802,226 A (Dischert et al.) in view of US 
6,526,099 B1 (Christopolous et al.). Dischert et aL teaches a video editor that operates 
on frequency-domain video (abstract). 

Regarding claim 2, figure 4 of Dischert et al. shows video streams inputted into 
analog/digital interface 402 and 404, and figure 5 shows video streams inputted into 
digital VCR heads 418 and 526 from the helical track of a digital video cassette. In the 
recording apparatus of figure 4, the data is coded within coder 410, which contains a 
DCT module, as shown in figure 8 (column 6: lines 22-47). This DCT encoding 
corresponds with the claimed step of obtaining transform coefficients representative of 
video data. Next, the coded data is mixed with a secondary signal in mixer 80, (column 
6: lines 39-47), producing a fade effect (column 7: lines 1-26). This corresponds with 
the claimed step of modifying the transform coefficients to achieve a video effect. 
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Dischert et al. is silent on residual video data or error video data. Christopoulos 
et al. teaches a transcoder that operates on spatial domain or frequency domain 
(abstract). Regarding the residual data in claim 2, Christopoulos et al. operates on 
video that has been coded with motion-compensated predictive coding, according to a 
standard video codec such as H.261 or H.263 (column 3: lines 15-17). In predictive 
coding, instead of transmitting every pixel value, instead only the variation between 
pixels is transmitted (column 1 , lines 40-49). 

Dischert et al. discloses the claimed invention except for modifying residual error 
video data. Christopoulos et al. teaches that it was known to perform functions on 
predictive-coded video data. Therefore, it would have been obvious to one having 
ordinary skill in the art at the time the invention was made to modify the fade effect 
device of Dischert et al. to operate on predictive-coded video data, as taught by 
Christopoulos et al., since Christopoulos et al. states in column 1: lines 15-31, that such 
a modification would improve the compression ratio of a coded video signal. 

Regarding claim 3, in Christopolous et al., a predictive (P) frame or a bidirectional 
(B) frame comprises motion compensated data comprising motion vectors and 
prediction error data, in accordance with a video codec such as H.263 (column 14: line 
53-column 15: line 19). 

Regarding claim 4, the DCT operation in DCT 60 in Dischert is considered a 
technique of video compression. 
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Regarding claim 5, the mixer of Dischert et al. operates over a time domain in 
which coefficients J and K vary over time to produce the fade effect (column 7: lines 1- 
26). 

Regarding claim 6, as shown in figure 8 of Dischert et al., while a video signal 
may come from an uncompressed source that is encoded with the DCT transform in the 
mixer, a video signal may also be input into the mixer via a partial decoder comprising 
variable-length decoder 86, run-length decoder 84, and de-quantizer 82 (column 6: lines 
29-40). Then, Dischert et al. discloses performing an effect on decoded quantized 
transform coefficients and performing inverse quantization. 

Regarding claim 7, in Dischert et al., figure 10A shows that in mixer 80, a video 
signal comprising transform coefficients is first scaled by a fading coefficient J or K 
before being mixed with another video signal. It is respectfully submitted that either a 
fading coefficient that is multiplied by a first signal or a second signal that is added to 
the multiplied first signal may be considered the claimed "editing data" according to the 
present invention. 

Regarding claim 8, Dischert et al. discloses that video data may be faded to 
black as part of a transition sequence (column 7, lines 5-9). 

Regarding claim 9, Dischert et al. discloses that video data may be faded to 
black as part of a transition sequence (column 7, lines 5-9). 

Regarding claim 10, Dischert et al. only teaches a fade to black. However, it 
would have been a matter of obvious design choice to one having ordinary skill in the 
art to fade to any desired color, since the applicant has not disclosed that fading to any 
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arbitrary color, including white, solves any stated problem or is for any particular 
purpose, and it appears the invention would perform equally well with fading to white. 

Regarding claim 13, figure 8 of Dischert et al. discloses dequantizer 82 in a video 
mixer that produces dequantized transform coefficients (column 6: line 40). This 
corresponds with the claimed "inverse quantizer". These transform coefficients are then 
combined with transform coefficients from another source in mixer 80 (column 6: lines 
40-47) to produce a fade effect. Then, mixer 80 corresponds with the claimed 
"summer", and the mixed signal corresponds with the claimed "further data". 

Regarding claim 14, figure 8 of Dischert et al. discloses variable quantizer 62 that 
performs quantizing on the mixed signal (column 6: line 26). 

Regarding claim 15, in Christopolous et al., a decoder such as for example one 
shown in the transform-domain transcoder of figure 9 includes a transform domain 
motion compensation module TD/MC. In the combination with Dischert et al., this would 
be added to the datapath of figure 8 after dequantizer 82. Then, this motion 
compensation module corresponds with the claimed "predictor", and the DCT 60 of 
Dischert et al., which would provide "editing data" relative to the partially decoded data, 
corresponds with the claimed "transform module". 

Regarding claim 16, in Dischert et al., figure 10A shows that in mixer 80, a video 
signal comprising transform coefficients is first scaled by a fading coefficient J or K 
before being mixed with another video signal. It is respectfully submitted that either a 
fading coefficient that is multiplied by a first signal or a second signal that, is added to 



Application/Control Number: 10/737,184 Page 12 

Art Unit: 2621 

the multiplied first signal may be considered the claimed "editing data" according to the 
present invention. 

Regarding claim 17, summer 80 in Dischert et al. combines transform coefficients 
according to coefficients J and K which vary over time to produce the fade effect 
(column 7: lines 1-26). Then, coefficient J or K corresponds with the claimed "editing 
data" that produces a video effect "in a time domain". 

Regarding claim 18, this claim, and dependent claims 19, 21-23, and 25, are in 
means-plus-function format and so 35 U.S.C. 112, sixth paragraph, applies. Then, 
these claims must be interpreted as particular to the structure disclosed in the 
specification. In re Donaldson Co., 16 F.3d 1189, 29 USPQ2d 1845 (Fed. Cir. 1994). 
In the present case, the datapath of figures 8 and 10A of Dischert et al. comprising 
dequantizer 82, multiplier 104, adder 105, and quantizer 62 is considered analogous to 
the datapath of figure 4 of the present invention comprising inverse quantizer 20, 
multiplier 22, adder 24, and quantizer 26. In particular to the limitations of claim 18, 
ECC decoder 512 of Dischert et al., which extracts a digital video signal from a 
bitstream comprising audio and video data (column 5: lines 24-26) and provides the 
video signal to mixer 80 (column 6: lines 29-34), corresponds with the claimed means 
for providing a bitstream indicative of video data, considered as demultiplexer 10 in 
figure 4 of the present invention, and mixer 80, which performs a partial decoding to the 
DCT coefficients and combines the digital video with a fading coefficient and another 
video bitstream to produce a fade effect corresponds with the claimed means for 
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obtaining transform coefficients and combining editing data to produce a modified 
bitstream, considered as editing module 5 in figure 4 of the present invention. 

Regarding claim 19, dequantizer 82 of Dischert et al. corresponds with the 
claimed inverse quantization module. 

Regarding claim 21, mixer 80 of Dischert et al. corresponds with the claimed 
combining module. 

Regarding claim 22, the examiner takes Official Notice that video cameras were 
well-known at the time of the invention as a source for providing video data, such as to 
an analog/digital interface of Dischert et al. 

Regarding claim 23, Christopoulos et al. teaches that it was known to input digital 
video from a receiver (column 9, lines 11-13, 19-35) 

Regarding claim 25, since the specification of the present invention does not 
describe or limit the structure of a storage medium (column 14: lines 21-23), the video 
cassette of Dischert et al. is considered to be encompassed by the claimed means for 
storing a video signal. 

Regarding claim 27, at least Christopolous et al. may be implemented in 
hardware or software (column 8: lines 31-32, 66-67). 

Regarding claim 28, in Dischert et al., a set of transform coefficients is multiplied 
by a fade coefficient J or K (column 6: line 67-coIumn 7: line 11). 

Regarding claim 29, in Dischert et al., two modified sets of transform coefficients 
are added to produce a final mixed video stream (column 7: lines 11-12). 
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Regarding claim 31, Dischert et al. discloses that video data may be faded to 
black as part of a transition sequence (column 7, lines 5-9). 

14. Claims 11-12 and 32 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over Dischert et al. in view of Christopoulos et al. as applied to claims 1 and 27 above, 
and further in view of US Patent 5,477,276A (Oguro). Although Dischert et al. teaches 
a video editor that performs basic operations such as a dissolve, a cross-fade, and a 
fade to black on frequency-domain data, it does not teach advanced editing effects. 
Oguro teaches a DSP apparatus that performs advanced fading effects. Regarding the 
fade from one color to another in claims 1 1 and 32, Oguro can fade in or fade out to any 
arbitrary color (column 11, lines 22-27; lines 46-51). Regarding the fade to 
monochrome in claim 12, the fade system of Oguro may operate only on Y (luminance) 
values and not process C (chrominance) values, thus performing only black-and-white 
fade operations (column 11, lines 6-21). 

Dischert et al., in combination with Christopoulos et al., discloses the claimed 
invention except for advanced fading techniques. Oguro teaches that it was known to 
perform fading techniques such as a fade to color or monochromatic fade. Therefore, it 
would have been obvious to one having ordinary skill of the art at the time the invention 
was made to apply the fading of Oguro to the editor of Dischert et al., since Oguro 
states in column 11, lines 29-51 that such a modification would simplify the circuitry 
needed in a fading device. 
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Conclusion 

15. This action is non-final due to the new rejection of claims 2-12 under 35 U.S.C. 
§101 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to David N. Werner whose telephone number is (571)272- 
9662. The examiner can normally be reached on Monday-Friday from 10:00-6:30! 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Mehrdad Dastouri can be reached on (571) 272-7418. The fax phone 
number for the organization where this application or proceeding is assigned is 571- 
273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 

ID. N. W./ 

Examiner, Art Unit 2621 
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Supervisory Patent Examiner, Art Unit 2621 
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TR is TRP is not available at the decoder, the decoder may send a forced INTRA update signal to the 
encoder by external means (for example, Recommendation H.245). Unless a different frame storage 
policy is negotiated by external means, correctly decoded video picture segments shall be stored into 
memory for use as later reference pictures on a first-in, first-out basis as shown in Figure N.l (except 
for B-pictures, which are not used as reference pictures), and video picture segments which are 
detected as having been incorrectly decoded should not replace correctly decoded ones in this 
memory area. 

An Acknowledgment message (ACK) and a Non-Acknowledgment message (NACK) are defined as 
back-channel messages. An ACK may be returned when the decoder decodes a video picture 
segment successfully. NACKs may be returned when the decoder fails to decode a video picture 
segment, and may continue to be returned until the decoder gets the expected forward channel data 
which includes the requested TRP or an INTRA update. Which types of message shall be sent is 
indicated in the RPSMF field of the picture header of the forward-channel data. 

In a usage scenario known as "Video Redundancy Coding", the Reference Picture Selection mode 
may be used by some encoders in a manner in which more than one representation is sent for the 
pictured scene at the same temporal instant (usually using different reference pictures). In such a case 
in which the Reference Picture Selection mode is in use and in which adjacent pictures in the 
bitstream have the same temporal reference, the decoder shall regard this occurrence as an indication 
that redundant copies have been sent of approximately the same pictured scene content, and shall 
decode and use the first such received picture while discarding the subsequent redundant picture(s). 

ANNEX O 

Temporal, SNR, and Spatial Scalability mode 

This annex describes the optional mode of this Recommendation in support of Temporal, SNR, and 
Spatial Scalability. This mode may also be used in conjunction with error control schemes. The 
capability of this mode and the extent to which its features are supported is signaled by external 
means (for example, Recommendation H.245). The use of this mode is indicated in PLUSPTYPE. 

O.l Overview 

Scalability allows for the decoding of a sequence at more than one quality level. This is done by 
using a hierarchy of pictures and enhancement pictures partitioned into one or more layers. There are 
three types of pictures used for scalability: B-, EI-, and EP-pictures, as explained below. Each of 
these has an enhancement layer number ELNUM that indicates to which layer it belongs, and a 
reference layer number RLNUM that indicates which layer is used for its prediction. The lowest layer 
is called the base layer, and has layer number 1 . 

Scalability is achieved by three basic methods: temporal, SNR, and spatial enhancement. 
O.l.l Temporal scalability 

Temporal scalability is achieved using bidirectionally predicted pictures, or B-pictures. B-pictures 
allow prediction from either or both a previous and subsequent reconstructed picture in the reference 
layer. This property generally results in improved compression efficiency as compared to that of 
P-pictures. These B-pictures differ from the B-picture part of a PB- (or Improved PB-) frame 
(see Annexes G and M) in that they are separate entities in the bitstream: they are not syntactically 
intermixed with a subsequent P- (or EP-) picture. 
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B-pictures (and the B-part of PB- or Improved PB-frames) are not used as reference pictures for the 
prediction of any other pictures. This property allows for B-pictures to be discarded if necessary 
without adversely affecting any subsequent pictures, thus providing temporal scalability. Figure 0.1 
illustrates the predictive structure of P- and B-pictures. 
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Figure O.l/H.263 - Illustration of B-picture prediction dependencies 



The location of B-pictures in the bitstream is in a data-dependence order rather than in strict temporal 
order. (This rule is consistent with the ordering of other pictures in the bitstream, but for all picture 
types other than the B-picture, no such conflict arises between the data-dependence order and the 
temporal order.) For example, if the pictures of a video sequence were numbered 1, 2, 3, then the 
bitstream order of the encoded pictures would be I,, P 3 , B 2 , P 5) B 4 , where the subscript' refers to 
the original picture number (as illustrated in Figure O.l). 

There is no limit to the number of B-pictures that maybe inserted between pairs of reference pictures 
in the reference layer (other than what is necessary to prevent temporal ambiguity from overflows of 
the temporal reference field in the picture header). However, a maximum number of such pictures 
may be signaled by external means (for example, Recommendation H.245). 

The picture height, width, and pixel aspect ratio of a B-picture shall always be equal to those of its 
temporally subsequent reference layer picture. 

Motion vectors are allowed to extend beyond the picture boundaries of B-pictures. 
0.1.2 SNR scalability 

The other basic method to achieve scalability is through spatial/SNR enhancement. Spatial scalability 
and SNR scalability are equivalent except for the use of interpolation as is described shortly Because 
compression introduces artifacts and distortions, the difference between a reconstructed picture and 
its original in the encoder is (nearly always) a nonzero-valued picture, containing what can be called 
the coding error. Normally, this coding error is lost at the encoder and never recovered With SNR 
scalability, these coding error pictures can also be encoded and sent to the decoder, producing an 
enhancement to the decoded picture. The extra data serves to increase the signal-to-noise ratio of the 
video picture, and hence, the term SNR scalability. Figure 0.2 illustrates the data flow for SNR 
scalability. The vertical arrows from the lower layer illustrate that the picture in the enhancement 
layer is predicted from a reconstructed approximation of that picture in the reference (lower) layer. 
If prediction is only formed from the lower layer, then the enhancement layer picture is refeired to as 
an El-picture. It is possible, however, to create a modified bidirectionally predicted picture using 
both a prior enhancement layer picture and a temporally simultaneous lower layer reference picture 
This type of picture is referred to as an EP-picture or "Enhancement" P-picture. The prediction flow 
for EI- and EP-pictures is shown in Figure 0.2. (Although not specifically shown in Figure O 2 an 
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El-picture in an enhancement layer may have a P-picture as its lower layer reference picture, and an 
EP-picture may have an I-picture as its lower-layer enhancement picture.) 
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Figure (X2/H.263 - Illustration of SNR scalability 



For both EI- and EP-pictures, the prediction from the reference layer uses no motion vectors. 
However, as with normal P-pictures, EP-pictures use motion vectors when predicting from their 
temporally prior reference picture in the same layer. 

0.1.3 Spatial scalability 

The third and final scalability method in the Temporal, SNR, and Spatial Scalability mode is spatial 
scalability, which is closely related to SNR scalability. The only difference is that before the picture 
in the reference layer is used to predict the picture in the spatial enhancement layer, it is interpolated 
by a factor of two either horizontally or vertically (1-D spatial scalability), or both horizontally and 
vertically (2-D spatial scalability). The interpolation filters for this operation are defined in 0.6. For a 
decoder to be capable of some forms of spatial scalability, it may also need to be capable of custom 
picture formats. For example, if the base layer is sub-QCIF (128 x 96), the 2-D spatial enhancement 
layer picture would be 256 x 192, which does not correspond to a standard picture format. Another 
example would be if the base layer were QCIF (176 x 144), with the standard pixel aspect ratio of 
12:11. A 1-D horizontal spatial enhancement layer would then correspond to a picture format of 
352 x 144 with a pixel aspect ratio of 6:1 1. Thus a custom picture format would have to be used for 
the enhancement layer in these cases. An example which does not require a custom picture format 
would be the use of a QCIF base layer with a CIF 2-D spatial enhancement layer. Spatial scalability 
is illustrated in Figure O.3. 
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Figure 0.3/H.263 - Illustration of spatial scalability 



Other than requiring an upsampling process to increase the size of the reference layer picture prior to 
its use as a reference for the encoding process, the processing and syntax for a spatial scalability 
picture is functionally identical to that for an SNR scalability picture. 

Since there is very little syntactical distinction between pictures using SNR scalability and pictures 
using spatial scalability, the pictures used for either purpose are called EI- and EP-pictures. 

The picture in the base layer which is used for upward prediction in an EI- or EP-picture may be an I- 
picture, a P-picture, or the P-part of a PB- or Improved PB-frame (but shall not be a B-picture or the 
B-part of a PB- or Improved PB-frame). 

0.1.4 Multilayer scalability 

It is possible not only for B-pictures to be temporally inserted between pictures of types I, P, PB, and 
Improved PB, but also between pictures of types EI and EP, whether these consist of SNR or spatial 
enhancement pictures. It is also possible to have more than one SNR or spatial enhancement layer in 
conjunction with a base layer. Thus, a multilayer scalable bitstream can be a combination of SNR 
layers, spatial layers, and B-pictures. The size of a picture cannot decrease, however, with increasing 
layer number. It can only stay the same or increase by a factor of two in one or both dimensions. 
Figure 0.4 illustrates a multilayer scalable bitstream. 
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Figure 0.4/H.263 - Illustration of multilayer scalability 



In the case of multilayer scalability, the picture in a reference layer which is used for upward 
prediction in an EI- or EP-picture may be a I-, P-, EI-, or EP-picture, or may be the P-part of a PB- or 
Improved PB-frame in the base layer (but shall not be a B-picture or the B-part of a PB- or Improved 
PB-frame). 

As with the two-layer case, B-pictures may occur in any layer. However, any picture in an 
enhancement layer which is temporally simultaneous with a B-picture in its reference layer must be a 
B-picture or the B-picture part of an PB- or Improved PB-frame. This is to preserve the disposable 
nature of B-pictures. Note, however, that B-pictures may occur in layers that have no corresponding 
picture in lower layers. This allows an encoder to send enhancement video with a higher picture rate 
than the lower layers. 

The enhancement layer number and the reference layer number for each enhancement picture (B-, EI- 
, or EP-) are indicated in the ELNUM and RLNUM fields, respectively, of the picture header (when 
present). See the inference rules described in 5.1,4.4 for when these fields are not present. If a B- 
picture appears in an enhancement layer in which temporally surrounding SNR or spatial scalability 
pictures also appear, the Reference Layer Number (RLNUM) of the B-picture shall be the same as 
the Enhancement Layer Number (ELNUM). 

The picture height, width, and pixel aspect ratio of a B-picture shall always be equal to those of its 
temporally subsequent reference layer picture. 



0.2 Transmission order of pictures 

Pictures which are dependent on other pictures shall be located in the bitstream after the pictures on 
which they depend. 

The bitstream syntax order is specified such that for reference pictures (i.e. picture having types I, P, 
EI, or EP, or the P-part of PB or Improved PB), the following two rules shall be obeyed: 

1) All reference pictures with the same temporal reference shall appear in the bitstream in 
increasing enhancement layer order (since each lower layer reference picture is needed to 
decode the next higher layer reference picture). 
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2) AH temporally simultaneous reference pictures as discussed in item 1) above shall appear in 
the bitstream prior to any B-pictures for which any of these reference pictures is the first 
temporally subsequent reference picture in the reference layer of the B-picture (in order to 
reduce the delay of decoding all reference pictures which may be needed as references for 
B-pictures). 

Then, the B-pictures with earlier temporal references shall follow (temporally ordered within each 
enhancement layer). 

The bitstream location of each B-picture shall comply with the following rules: 

1) Its bitstream location shall be after that of its first temporally subsequent reference picture in 
the reference layer (since the decoding of the B-picture generally depends on the prior 
decoding of that reference picture). 

2) Its bitstream location shall be after that of all reference pictures that are temporally 
simultaneous with the first temporally subsequent reference picture in the reference layer 
(in order to reduce the delay of decoding all reference pictures which may be needed as 
references for B-pictures). 

3) Its bitstream location shall precede the location of any additional temporally subsequent 
pictures other than B-pictures in its reference layer (since to allow otherwise would increase 
picture-storage memory requirements for the reference layer pictures). 

4) Its bitstream location shall be after that of all EI- and EP-pictures that are temporally 
simultaneous with the first temporally subsequent reference picture. 

5) Its bitstream location shall precede the location of all temporally subsequent pictures within 
its same enhancement layer (since to allow otherwise would introduce needless delay and 
increase picture-storage memory requirements for the enhancement layer). 

Figure 0,5 illustrates two allowable picture transmission orders given by the rules above for the 
layering structure shown therein (with numbers in dotted-line boxes indicating the bitstream order, 
separated by commas for the two alternatives). 
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Figure 0.5/H.263 - Example of picture transmission order 
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